Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 428
Filtrar
1.
Insights Imaging ; 15(1): 104, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589691

RESUMO

OBJECTIVE: The aim of this study was to evaluate and compare reliability, costs, and radiation dose of dual-energy X-ray absorptiometry (DXA) to MRI and CT in measuring muscle mass for the diagnosis of sarcopenia. METHODS: Thirty-four consecutive DXA scans performed in surgically menopausal women from November 2019 until March 2020 were analyzed by two observers. Observers analyzed muscle mass of the lower limbs in every scan twice. Reliability was assessed by calculating inter- and intra-observer variability. Reliability from CT and MRI as well as radiation dose from CT and DXA were collected from literature. Costs for each type of scan were calculated according to the guidelines for economic evaluation of the Dutch National Health Care Institute. RESULTS: The 34 participants had a median age of 58 years (IQR 53-65) and a median body mass index of 24.6 (IQR 21.7-29.7). Inter-observer variability had an intraclass correlation coefficient (ICC) of 0.997 (95% CI 0.994-0.998) with a relative variability of 0.037 ± 0.022%. Regarding intra-observer variability, observer 1 had an ICC of 0.998 (95% CI 0.996-0.999) with a relative variability of 0.019 ± 0.016% and observer 2 had an ICC of 0.997 (95% CI 0.993-0.998) with a relative variability of 0.016 ± 0.011%. DXA costs were €62, CT €77, and MRI €195. The estimated radiation dose of CT was 2.5-3.0 mSv, for DXA this was 2-4 µSv. CONCLUSIONS: DXA has lower costs and a lower radiation dose, with low inter- and intra-observer variability, compared to CT and MRI for assessing lower limb muscle mass. TRIAL REGISTRATION: Netherlands Trial Register; NL8068. CRITICAL RELEVANCE STATEMENT: DXA is a good alternative for CT and MRI in assessing lower limb muscle mass, with lower costs and lower radiation dose, while inter-observer and intra-observer variability are low. KEY POINTS: • Screening for sarcopenia should be optimized as the population ages. • DXA outperformed CT and MRI in the measured metrics. • DXA validity should be further evaluated as an alternative to CT and MRI for sarcopenia evaluation.

2.
Histopathology ; 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38571446

RESUMO

AIMS: Following the increased use of neoadjuvant therapy for pancreatic cancer, grading of tumour regression (TR) has become part of routine diagnostics. However, it suffers from marked interobserver variation, which is mainly ascribed to the subjectivity of the defining criteria of the categories in TR grading systems. We hypothesized that a further cause for the interobserver variation is the use of divergent and nonspecific morphological criteria to identify tumour regression. METHODS AND RESULTS: Twenty treatment-naïve pancreatic cancers and 20 pancreatic cancers treated with neoadjuvant chemotherapy were reviewed by three experienced pancreatic pathologists who, blinded for treatment status, categorized each tumour as treatment-naïve or neoadjuvantly treated, and annotated all tissue areas they considered showing tumour regression. Only 50%-65% of the cases were categorized correctly, and the annotated tissue areas were highly discrepant (only 3%-41% overlap). When the prevalence of various morphological features deemed to indicate TR was compared between treatment-naïve and neoadjuvantly treated tumours, only one pattern, characterized by reduced cancer cell density and prominent stroma affecting a large area of the tumour bed, occurred significantly more frequently, but not exclusively, in the neoadjuvantly treated group. Finally, stromal features, both morphological and biological, were investigated as possible markers for tumour regression, but failed to distinguish TR from native tumour stroma. CONCLUSION: There is considerable divergence in opinion between pathologists when it comes to the identification of tumour regression. Reliable identification of TR is only possible if it is extensive, while lesser degrees of treatment effect cannot be recognized with certainty.

3.
Eur Radiol ; 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38488970

RESUMO

BACKGROUND: The Paris classification categorises colorectal polyp morphology. Interobserver agreement for Paris classification has been assessed at optical colonoscopy (OC) but not CT colonography (CTC). We aimed to determine the following: (1) interobserver agreement for the Paris classification using CTC between radiologists; (2) if radiologist experience influenced classification, gross polyp morphology, or polyp size; and (3) the extent to which radiologist classifications agreed with (a) colonoscopy and (b) a combined reference standard. METHODS: Following ethical approval for this non-randomised prospective cohort study, seven radiologists from three hospitals classified 52 colonic polyps using the Paris system. We calculated interobserver agreement using Fleiss kappa and mean pairwise agreement (MPA). Absolute agreement was calculated between radiologists; between CTC and OC; and between CTC and a combined reference standard using all available imaging, colonoscopic, and histopathological data. RESULTS: Overall interobserver agreement between the seven readers was fair (Fleiss kappa 0.33; 95% CI 0.30-0.37; MPA 49.7%). Readers with < 1500 CTC experience had higher interobserver agreement (0.42 (95% CI 0.35-0.48) vs. 0.33 (95% CI 0.25-0.42)) and MPA (69.2% vs 50.6%) than readers with ≥ 1500 experience. There was substantial overall agreement for flat vs protuberant polyps (0.62 (95% CI 0.56-0.68)) with a MPA of 87.9%. Agreement between CTC and OC classifications was only 44%, and CTC agreement with the combined reference standard was 56%. CONCLUSION: Radiologist agreement when using the Paris classification at CT colonography is low, and radiologist classification agrees poorly with colonoscopy. Using the full Paris classification in routine CTC reporting is of questionable value. CLINICAL RELEVANCE STATEMENT: Interobserver agreement for radiologists using the Paris classification to categorise colorectal polyp morphology is only fair; routine use of the full Paris classification at CT colonography is questionable. KEY POINTS: • Overall interobserver agreement for the Paris classification at CT colonography (CTC) was only fair, and lower than for colonoscopy. • Agreement was higher for radiologists with < 1500 CTC experience and for larger polyps. There was substantial agreement when classifying polyps as protuberant vs flat. • Agreement between CTC and colonoscopic polyp classification was low (44%).

4.
Radiother Oncol ; 194: 110196, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38432311

RESUMO

BACKGROUND AND PURPOSE: Studies investigating the application of Artificial Intelligence (AI) in the field of radiotherapy exhibit substantial variations in terms of quality. The goal of this study was to assess the amount of transparency and bias in scoring articles with a specific focus on AI based segmentation and treatment planning, using modified PROBAST and TRIPOD checklists, in order to provide recommendations for future guideline developers and reviewers. MATERIALS AND METHODS: The TRIPOD and PROBAST checklist items were discussed and modified using a Delphi process. After consensus was reached, 2 groups of 3 co-authors scored 2 articles to evaluate usability and further optimize the adapted checklists. Finally, 10 articles were scored by all co-authors. Fleiss' kappa was calculated to assess the reliability of agreement between observers. RESULTS: Three of the 37 TRIPOD items and 5 of the 32 PROBAST items were deemed irrelevant. General terminology in the items (e.g., multivariable prediction model, predictors) was modified to align with AI-specific terms. After the first scoring round, further improvements of the items were formulated, e.g., by preventing the use of sub-questions or subjective words and adding clarifications on how to score an item. Using the final consensus list to score the 10 articles, only 2 out of the 61 items resulted in a statistically significant kappa of 0.4 or more demonstrating substantial agreement. For 41 items no statistically significant kappa was obtained indicating that the level of agreement among multiple observers is due to chance alone. CONCLUSION: Our study showed low reliability scores with the adapted TRIPOD and PROBAST checklists. Although such checklists have shown great value during development and reporting, this raises concerns about the applicability of such checklists to objectively score scientific articles for AI applications. When developing or revising guidelines, it is essential to consider their applicability to score articles without introducing bias.


Assuntos
Inteligência Artificial , Lista de Checagem , Técnica Delfos , Planejamento da Radioterapia Assistida por Computador , Humanos , Planejamento da Radioterapia Assistida por Computador/métodos , Planejamento da Radioterapia Assistida por Computador/normas , Guias de Prática Clínica como Assunto , Viés , Reprodutibilidade dos Testes , Neoplasias/radioterapia
5.
BMC Med Res Methodol ; 24(1): 61, 2024 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-38461273

RESUMO

BACKGROUND: The provision of data sharing statements (DSS) for clinical trials has been made mandatory by different stakeholders. DSS are a device to clarify whether there is intention to share individual participant data (IPD). What is missing is a detailed assessment of whether DSS are providing clear and understandable information about the conditions for data sharing of IPD for secondary use. METHODS: A random sample of 200 COVID-19 clinical trials with explicit DSS was drawn from the ECRIN clinical research metadata repository. The DSS were assessed and classified, by two experienced experts and one assessor with less experience in data sharing (DS), into different categories (unclear, no sharing, no plans, yes but vague, yes on request, yes with specified storage location, yes but with complex conditions). RESULTS: Between the two experts the agreement was moderate to substantial (kappa=0.62, 95% CI [0.55, 0.70]). Agreement considerably decreased when these experts were compared with a third person who was less experienced and trained in data sharing ("assessor") (kappa=0.33, 95% CI [0.25, 0.41]; 0.35, 95% CI [0.27, 0.43]). Between the two experts and under supervision of an independent moderator, a consensus was achieved for those cases, where both experts had disagreed, and the result was used as "gold standard" for further analysis. At least some degree of willingness of DS (data sharing) was expressed in 63.5% (127/200) cases. Of these cases, around one quarter (31/127) were vague statements of support for data sharing but without useful detail. In around half of the cases (60/127) it was stated that IPD could be obtained by request. Only in in slightly more than 10% of the cases (15/127) it was stated that the IPD would be transferred to a specific data repository. In the remaining cases (21/127), a more complex regime was described or referenced, which could not be allocated to one of the three previous groups. As a result of the consensus meetings, the classification system was updated. CONCLUSION: The study showed that the current DSS that imply possible data sharing are often not easy to interpret, even by relatively experienced staff. Machine based interpretation, which would be necessary for any practical application, is currently not possible. Machine learning and / or natural language processing techniques might improve machine actionability, but would represent a very substantial investment of research effort. The cheaper and easier option would be for data providers, data requestors, funders and platforms to adopt a clearer, more structured and more standardised approach to specifying, providing and collecting DSS. TRIAL REGISTRATION: The protocol for the study was pre-registered on ZENODO ( https://zenodo.org/record/7064624#.Y4DIAHbMJD8 ).


Assuntos
Disseminação de Informação , Projetos de Pesquisa , Humanos , Disseminação de Informação/métodos , Consenso , Sistema de Registros
6.
Stat Methods Med Res ; 33(3): 532-553, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38320802

RESUMO

Reliability of measurement instruments providing quantitative outcomes is usually assessed by an intraclass correlation coefficient. When participants are repeatedly measured by a single rater or device, or, are each rated by a different group of raters, the intraclass correlation coefficient is based on a one-way analysis of variance model. When planning a reliability study, it is essential to determine the number of participants and measurements per participant (i.e. number of raters or number of repeated measurements). Three different sample size determination approaches under the one-way analysis of variance model were identified in the literature, all based on a confidence interval for the intraclass correlation coefficient. Although eight different confidence interval methods can be identified, Wald confidence interval with Fisher's large sample variance approximation remains most commonly used despite its well-known poor statistical properties. Therefore, a first objective of this work is comparing the statistical properties of all identified confidence interval methods-including those overlooked in previous studies. A second objective is developing a general procedure to determine the sample size using all approaches since a closed-form formula is not always available. This procedure is implemented in an R Shiny app. Finally, we provide advice for choosing an appropriate sample size determination method when planning a reliability study.


Assuntos
Tamanho da Amostra , Humanos , Reprodutibilidade dos Testes , Variações Dependentes do Observador , Análise de Variância
7.
Pathologie (Heidelb) ; 45(2): 115-123, 2024 Mar.
Artigo em Alemão | MEDLINE | ID: mdl-38381370

RESUMO

BACKGROUND: Metabolic dysfunction-associated steatotic liver disease (MASLD), or non-alcoholic fatty liver disease (NAFLD), is a common disease that is diagnosed through manual evaluation of liver biopsies, an assessment that is subject to high interobserver variability (IBV). IBV can be reduced using automated methods. OBJECTIVES: Many existing computer-based methods do not accurately reflect what pathologists evaluate in practice. The goal is to demonstrate how these differences impact the prediction of hepatic steatosis. Additionally, IBV complicates algorithm validation. MATERIALS AND METHODS: Forty tissue sections were analyzed to detect steatosis, nuclei, and fibrosis. Data generated from automated image processing were used to predict steatosis grades. To investigate IBV, 18 liver biopsies were evaluated by multiple observers. RESULTS: Area-based approaches yielded more strongly correlated results than nucleus-based methods (⌀ Spearman rho [ρ] = 0.92 vs. 0.79). The inclusion of information regarding tissue composition reduced the average absolute error for both area- and nucleus-based predictions by 0.5% and 2.2%, respectively. Our final area-based algorithm, incorporating tissue structure information, achieved a high accuracy (80%) and strong correlation (⌀ Spearman ρ = 0.94) with manual evaluation. CONCLUSION: The automatic and deterministic evaluation of steatosis can be improved by integrating information about tissue composition and can serve to reduce the influence of IBV.


Assuntos
Hepatopatia Gordurosa não Alcoólica , Humanos , Hepatopatia Gordurosa não Alcoólica/diagnóstico , Biópsia , Fibrose , Automação
8.
Cancer Med ; 13(2): e6967, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38348960

RESUMO

RATIONALE AND OBJECTIVES: Computer-aided detection (CAD) of pulmonary nodules reduces the impact of observer variability, improving the reliability and reproducibility of nodule assessments in clinical practice. Therefore, this study aimed to assess the impact of CAD on inter-observer agreement in the follow-up management of subsolid nodules. MATERIALS AND METHODS: A dataset comprising 60 subsolid nodule cases was constructed based on the National Cancer Center lung cancer screening data. Five observers independently assessed all low-dose computed tomography scans and assigned follow-up management strategies to each case according to the National Comprehensive Cancer Network (NCCN) guidelines, using both manual measurements and CAD assistance. The linearly weighted Cohen's kappa test was used to measure agreement between paired observers. Agreement among multiple observers was evaluated using the Fleiss kappa statistic. RESULTS: The agreement of the five observers for NCCN follow-up management categorization was moderate when measured manually, with a Fleiss kappa score of 0.437. Utilizing CAD led to a notable enhancement in agreement, achieving a substantial consensus with a Fleiss kappa value of 0.623. After using CAD, the proportion of major and substantial management discrepancies decreased from 27.5% to 15.8% and 4.8% to 1.5%, respectively (p < 0.01). In 23 lung cancer cases presenting as part-solid nodules, CAD significantly elevates the average sensitivity in detecting lung cancer cases presenting as part-solid nodules (overall sensitivity, 82.6% vs. 92.2%; p < 0.05). CONCLUSION: The application of CAD significantly improves inter-observer agreement in the follow-up management strategy for subsolid nodules. It also demonstrates the potential to reduce substantial management discrepancies and increase detection sensitivity in lung cancer cases presenting as part-solid nodules.


Assuntos
Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Reprodutibilidade dos Testes , Detecção Precoce de Câncer , Variações Dependentes do Observador , Seguimentos , Computadores
9.
Global Spine J ; : 21925682241235607, 2024 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-38382044

RESUMO

STUDY DESIGN: Reliability analysis. OBJECTIVES: Vertebral pelvic angles (VPA) are gaining popularity given their ability to describe the shape of the spine. Understanding the reliability and minimal detectable change (MDC) is necessary to determine how these measurement tools should be used in the manual assessment of spine radiographs. Our aim is to assess intra- and interobserver intraclass correlation coefficients (ICC) and the MDC in the use of VPA for assessing alignment in adult spinal deformity (ASD). METHODS: Three independent examiners blindly measured T1, T4, T9, L1, and L4PA twice in ASD patients with a 4-week window after the initial measurements. Patients who had undergone hip or shoulder arthroplasty, fused or transitional vertebrae, or whose hip joints were not visible on radiographs were excluded. Power analysis calculated a minimum sample size of 19. Both intra- and interobserver ICC and MDC, which denotes the smallest detectable change in a true value with 95% confidence, were calculated. RESULTS: Out of the 193 patients, 39 were ultimately included in the study, and 390 measurements were performed by 3 raters. Intraobserver ICC values ranged from .90 to .99. The interobserver ICC was .97, .97, .96, .95, and .92, and the MDC was 5.3°, 5.1°, 4.8°, 4.9°, and 4.1° for T1, T4, T9, L1, and L4PA, respectively. CONCLUSION: All VPAs showed excellent intra- and interobserver reliability, however, the MDC is relatively high compared to typical ranges for VPA values. Therefore, surgeons must be aware that substantial alignment changes may not be detected by a single VPA.

11.
Arch Orthop Trauma Surg ; 144(3): 1149-1159, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38231206

RESUMO

INTRODUCTION: Despite being the most used exam today, few studies have evaluated the accuracy of findings on non-contrast magnetic resonance imaging (MRI). The primary objective of the study was to evaluate the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of non-contrast MRI findings in frozen shoulder, isolated and in combination. The secondary objectives were to define the interobserver and intraobserver agreement of the assessments and the odds ratio for frozen shoulder because of the various findings of MRI. METHODS: A retrospective diagnostic accuracy study comparing non-contrast MRI findings between the frozen shoulder group and the control group. Sensitivity, specificity, positive and negative predictive value, accuracy, odds ratio, interobserver and intraobserver agreement were calculated for each finding and their possible associations. RESULTS: The hyperintensity on capsule in the axillary recess presented 84% sensitivity, 94% specificity, and 89% accuracy. The obliteration of the subcoracoid fat triangle in the rotator interval had sensitivity 34%, specificity 82% and accuracy 58%. For coracohumeral ligament thickness ≥ 2 mm had specificity 66%, 48% specificity and 57% accuracy. Capsule thickness in the axillary recess ≥ 4 mm resulted in 54% sensitivity, 82% specificity, and 68% accuracy. Regarding interobserver agreement, only the posteroinferior and posterosuperior quadrants showed moderate results, and all the others showed strong reliability. The odds ratio for hyperintensity in the axillary recess was 82.3 for frozen shoulder. The association of these findings increased specificity (95%). CONCLUSION: The accuracy of non-contrast magnetic resonance imaging is high for diagnosing frozen shoulder, especially when evaluating the hyperintensity of the axillary recess. The exam has high reliability and reproducibility. The presence of an association of signs increases the specificity of the test. LEVEL OF EVIDENCE: Level III, study of diagnostic test.


Assuntos
Bursite , Articulação do Ombro , Humanos , Estudos Retrospectivos , Reprodutibilidade dos Testes , Articulação do Ombro/patologia , Imageamento por Ressonância Magnética/métodos , Bursite/diagnóstico por imagem , Sensibilidade e Especificidade
12.
Eur J Nucl Med Mol Imaging ; 51(6): 1741-1752, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38273003

RESUMO

PURPOSE: Prostate-specific membrane antigen (PSMA) positron emission tomography/ computed tomography (PET/CT) is recognized as the most accurate imaging modality for detection of metastatic high-risk prostate cancer (PCa). Its role in the local staging of disease is yet unclear. We assessed the intra- and interobserver variability, as well as the diagnostic accuracy of the PSMA PET/CT based molecular imaging local tumour stage (miT-stage) for the local tumour stage assessment in a large, multicentre cohort of patients with intermediate and high-risk primary PCa, with the radical prostatectomy specimen (pT-stage) serving as the reference standard. METHODS: A total of 600 patients who underwent staging PSMA PET/CT before robot-assisted radical prostatectomy was studied. In 579 PSMA positive primary prostate tumours a comparison was made between miT-stage as assessed by four nuclear physicians and the pT-stage according to ISUP protocol. Sensitivity, specificity and diagnostic accuracy were determined. In a representative subset of 100 patients, the intra-and interobserver variability were assessed using Kappa-estimates. RESULTS: The sensitivity and specificity of the PSMA PET/CT based miT-stage were 58% and 59% for pT3a-stage, 30% and 97% for ≥ pT3b-stage, and 68% and 61% for overall ≥ pT3-stage, respectively. No statistically significant differences in diagnostic accuracy were found between tracers. We found a substantial intra-observer agreement for PSMA PET/CT assessment of ≥ T3-stage (k 0.70) and ≥ T3b-stage (k 0.75), whereas the interobserver agreement for the assessment of ≥ T3-stage (k 0.47) and ≥ T3b-stage (k 0.41) were moderate. CONCLUSION: In a large, multicentre study evaluating 600 patients with newly diagnosed intermediate and high-risk PCa, we showed that PSMA PET/CT may have a value in local tumour staging when pathological tumour stage in the radical prostatectomy specimen was used as the reference standard. The intra-observer and interobserver variability of assessment of tumour extent on PSMA PET/CT was moderate to substantial.


Assuntos
Antígenos de Superfície , Glutamato Carboxipeptidase II , Estadiamento de Neoplasias , Variações Dependentes do Observador , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Neoplasias da Próstata , Humanos , Masculino , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Neoplasias da Próstata/cirurgia , Idoso , Pessoa de Meia-Idade , Glutamato Carboxipeptidase II/metabolismo
13.
J Int Neuropsychol Soc ; : 1-6, 2024 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-38263747

RESUMO

OBJECTIVE: Self- and informant-ratings of functional abilities are used to diagnose mild cognitive impairment (MCI) and are commonly measured in clinical trials. Ratings are assumed to be accurate, yet they are subject to biases. Biases in self-ratings have been found in individuals with dementia who are older and more depressed and in caregivers with higher distress, burden, and education. This study aimed to extend prior findings using an objective approach to identify determinants of bias in ratings. METHOD: Participants were 118 individuals with MCI and their informants. Three discrepancy variables were generated including the discrepancies between (1) self- and informant-rated functional status, (2) informant-rated functional status and objective cognition (in those with MCI), and (3) self-rated functional status and objective cognition. These variables served as dependent variables in forward linear regression models, with demographics, stress, burden, depression, and self-efficacy as predictors. RESULTS: Informants with higher stress rated individuals with MCI as having worse functional abilities relative to objective cognition. Individuals with MCI with worse self-efficacy rated their functional abilities as being worse compared to objective cognition. Informant-ratings were worse than self-ratings for informants with higher stress and individuals with MCI with higher self-efficacy. CONCLUSION: This study highlights biases in subjective ratings of functional abilities in MCI. The risk for relative underreporting of functional abilities by individuals with higher stress levels aligns with previous research. Bias in individuals with MCI with higher self-efficacy may be due to anosognosia. Findings have implications for the use of subjective ratings for diagnostic purposes and as outcome measures.

14.
Breast ; 73: 103599, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37992527

RESUMO

PURPOSE: To quantify interobserver variation (IOV) in target volume and organs-at-risk (OAR) contouring across 31 institutions in breast cancer cases and to explore the clinical utility of deep learning (DL)-based auto-contouring in reducing potential IOV. METHODS AND MATERIALS: In phase 1, two breast cancer cases were randomly selected and distributed to multiple institutions for contouring six clinical target volumes (CTVs) and eight OAR. In Phase 2, auto-contour sets were generated using a previously published DL Breast segmentation model and were made available for all participants. The difference in IOV of submitted contours in phases 1 and 2 was investigated quantitatively using the Dice similarity coefficient (DSC) and Hausdorff distance (HD). The qualitative analysis involved using contour heat maps to visualize the extent and location of these variations and the required modification. RESULTS: Over 800 pairwise comparisons were analysed for each structure in each case. Quantitative phase 2 metrics showed significant improvement in the mean DSC (from 0.69 to 0.77) and HD (from 34.9 to 17.9 mm). Quantitative analysis showed increased interobserver agreement in phase 2, specifically for CTV structures (5-19 %), leading to fewer manual adjustments. Underlying IOV differences causes were reported using a questionnaire and hierarchical clustering analysis based on the volume of CTVs. CONCLUSION: DL-based auto-contours improved the contour agreement for OARs and CTVs significantly, both qualitatively and quantitatively, suggesting its potential role in minimizing radiation therapy protocol deviation.


Assuntos
Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Planejamento da Radioterapia Assistida por Computador/métodos , Órgãos em Risco , Mama/diagnóstico por imagem
15.
Diagn Interv Radiol ; 30(2): 124-134, 2024 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-37789677

RESUMO

PURPOSE: The reproducibility of relative cerebral blood volume (rCBV) measurements among readers with different levels of experience is a concern. This study aimed to investigate the inter-reader reproducibility of rCBV measurement of glioblastomas using the hotspot method in dynamic susceptibility contrast perfusion magnetic resonance imaging (DSC-MRI) with various strategies. METHODS: In this institutional review board-approved single-center study, 30 patients with glioblastoma were retrospectively evaluated with DSC-MRI at a 3.0 Tesla scanner. Three groups of reviewers, including neuroradiologists, general radiologists, and radiology residents, calculated the rCBV based on the number of regions of interest (ROIs) and reference areas. For statistical analysis of feature reproducibility, the intraclass correlation coefficient (ICC) and Bland-Altman plots were used. Analyses were made among individuals, reader groups, reader-group pooling, and a population that contained all of them. RESULTS: For individuals, the highest inter-reader reproducibility was observed between neuroradiologists [ICC: 0.527; 95% confidence interval (CI): 0.21-0.74] and between residents (ICC: 0.513; 95% CI: 0.20-0.73). There was poor reproducibility in the analyses of individuals with different levels of experience (ICC range: 0.296-0.335) and in reader-wise and group-wise pooling (ICC range: 0.296-0.335 and 0.397-0.427, respectively). However, an increase in ICC values was observed when five ROIs were used. In an analysis of all strategies, the ICC for the centrum semiovale was significantly higher than that for contralateral white matter (P < 0.001). CONCLUSION: The inter-reader reproducibility of rCBV measurement was poor to moderate regardless of whether it was calculated by neuroradiologists, general radiologists, or residents, which may indicate the need for automated methods. Choosing five ROIs and using the centrum semiovale as a reference area may increase reliability for all users.

16.
Placenta ; 145: 162-168, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38142649

RESUMO

INTRODUCTION: Reliability studies of placental examination have shown differing interobserver agreement for certain pathological features, a lack of uniform reporting criteria and variable experience among pathologists. In previous analyses we have shown that placental pathology differs by ethnicity. This validation study was performed to investigate whether bias related to ethnicity is a feature of placental pathology reporting in New Zealand (NZ). METHODS: 199 of 1726 eligible perinatal death cases between 2008 and 2017 were selected at random for this audit-type study, including 51 cases from South Asian, Maori and NZ European ethnicity and 46 cases from Pacific mothers. Stored histology slides were blinded and re-examined by an experienced perinatal pathologist, and linked to the corresponding original pathology report. Interobserver agreement (overall, by ethnicity and by gestational age) was described by proportional differences and kappa coefficients. RESULTS: Total interobserver agreement between original placental reporting and the validation review was 89.7 %, which differed by pathological feature. There was generally more underreporting than overreporting (3.6 % and 6.7 %, respectively). There was little disagreement by ethnicity (decidual vasculopathy [p = 0.03]), although there were more differences by gestational age (villous morphology [p < 0.01], chorioamnionitis [p = 0.03], high-grade villitis of unknown etiology [p < 0.01], and placental haemorrhage [p = 0.03]). DISCUSSION: No systematic bias in placental pathology reporting in NZ was identified by ethnicity or gestational age, as observed differences could be related to the underlying prevalence of pathology. We identified more underreporting than overreporting of pathology in the original reports, emphasizing the importance of placental investigation by specialised perinatal pathologists.


Assuntos
Etnicidade , Patologia , Placenta , Feminino , Humanos , Gravidez , Nova Zelândia , Placenta/patologia , Reprodutibilidade dos Testes , Variações Dependentes do Observador , Patologia/normas
17.
Eur Radiol Exp ; 7(1): 65, 2023 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-37872406

RESUMO

BACKGROUND: We investigated whether a short, 5-min magnetic resonance imaging (MRI) protocol consisting of only axial T2-weighted and diffusion-weighted imaging (DWI) sequences can discriminate between tonsillar infections, peritonsillar abscesses and deeply extending abscesses in a retrospective, blinded, multireader setting. METHODS: We included patients sent by emergency physicians with suspected pharyngotonsillar infections who underwent emergency neck 3-T MRI from April 1 2013 to December 31 2018. Three radiologists (with 10-16 years of experience) reviewed the images for abscesses and their extension into deep neck spaces. Data were reviewed first using only axial T2-weighted Dixon images and DWI (short protocol) and second including other sequences and contrast-enhanced T1-weighted Dixon images (full protocol). Diagnostic accuracy, interobserver agreement, and reader confidence were measured. Surgical findings and clinical course served as standard of reference. RESULTS: The final sample consisted of 52 patients: 13 acute tonsillitis with no abscesses, 19 peritonsillar abscesses, and 20 deeply extending abscesses. Using the short protocol, diagnostic accuracy for abscesses across all readers was good-to-excellent: sensitivity 0.93 (95% confidence interval 0.87-0.97), specificity 0.85 (0.70-0.93), accuracy 0.91 (0.85-0.95). Using the full protocol, respective values were 0.98 (0.93-1.00), 0.85 (0.70-0.93), and 0.95 (0.90-0.97), not significantly different compared with the short protocol. Similar trends were seen with detecting deep extension. Interobserver agreement was similar between protocols. However, readers had higher confidence in diagnosing abscesses using the full protocol. CONCLUSIONS: Short MRI protocol showed good-to-excellent accuracy for tonsillar abscesses. Contrast-enhanced images improved reader confidence but did not affect diagnostic accuracy or interobserver agreement. RELEVANCE STATEMENT: Short protocol consisting only of T2-weighted Dixon and DWI sequences can accurately image tonsillar abscesses, which may improve feasibility of emergency neck MRI. KEY POINTS: • The short 3-T MRI protocol (T2-weighted images and DWI) was faster (5 min) than the full protocol including T1-weighted contrast-enhanced images (24 min). • The short 3-T MRI protocol showed good diagnostic accuracy for pharyngotonsillar abscesses. • Contrast-enhanced sequences improved reader confidence but did not impact diagnostic accuracy or interobserver agreement.


Assuntos
Abscesso Peritonsilar , Humanos , Abscesso Peritonsilar/diagnóstico por imagem , Estudos Retrospectivos , Meios de Contraste , Imageamento por Ressonância Magnética/métodos , Imagem de Difusão por Ressonância Magnética/métodos
18.
Radiol Bras ; 56(4): 187-194, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37829590

RESUMO

Objective: To assess the reliability of phase-sensitive inversion recovery (PSIR) magnetic resonance imaging (MRI) and its accuracy for determining the topography of demyelinating cortical lesions in patients with multiple sclerosis (MS). Materials and Methods: This was a cross-sectional study conducted at a tertiary referral center for MS and other demyelinating disorders. We assessed the agreement among three raters for the detection and topographic classification of cortical lesions on fluid-attenuated inversion recovery (FLAIR) and PSIR sequences in patients with MS. Results: We recruited 71 patients with MS. The PSIR sequences detected 50% more lesions than did the FLAIR sequences. For detecting cortical lesions, the level of interrater agreement was satisfactory, with a mean free-response kappa (κFR) coefficient of 0.60, whereas the mean κFR for the topographic reclassification of the lesions was 0.57. On PSIR sequences, the raters reclassified 366 lesions (20% of the lesions detected on FLAIR sequences), with excellent interrater agreement. There was a significant correlation between the total number of lesions detected on PSIR sequences and the Expanded Disability Status Scale score (ρ = 0.35; p < 0.001). Conclusion: It seems that PSIR sequences perform better than do FLAIR sequences, with clinically satisfactory interrater agreement, for the detection and topographic classification of cortical lesions. In our sample of patients with MS, the PSIR MRI findings were significantly associated with the disability status, which could influence decisions regarding the treatment of such patients.


Objetivo: Avaliar a confiabilidade da sequência PSIR e sua precisão no diagnóstico topográfico de lesões corticais desmielinizantes em pacientes com esclerose múltipla (EM). Materiais e Métodos: Estudo transversal realizado em centro de referência terciário para EM e distúrbios desmielinizantes. Avaliamos a concordância entre três avaliadores na identificação e classificação topográfica de lesões corticais na ressonância magnética de pacientes com EM, utilizando as sequências FLAIR e PSIR. Resultados: Foram incluídos 71 pacientes com EM. Em PSIR detectou-se 1,5× mais lesões do que em FLAIR, com concordância satisfatória entre examinadores na identificação de lesões corticais, com coeficiente kappa de resposta livre (κFR) = 0,60, e na reclassificação topográfica das lesões, com κFR médio = 0,57. Os avaliadores reclassificaram 366 lesões em PSIR (20% das lesões detectadas em FLAIR), com excelente concordância. Houve correlação significativa do total de lesões detectadas em PSIR e o escore da escala de incapacidade EDSS (ρ = 0,35; p < 0,001). Conclusão: PSIR mostrou-se superior na detecção de lesões corticais e na classificação topográfica destas em comparação ao FLAIR, com concordâncias entre examinadores clinicamente satisfatórias. A associação significativa entre o número de lesões corticais em PSIR e o grau de incapacidade dos pacientes pode influenciar em decisões terapêuticas.

19.
Breast ; 72: 103578, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37713940

RESUMO

BACKGROUND: Normal tissue complication probability (NTCP) models can be useful to estimate the risk of fibrosis after breast-conserving surgery (BCS) and radiotherapy (RT) to the breast. However, they are subject to uncertainties. We present the impact of contouring variation on the prediction of fibrosis. MATERIALS AND METHODS: 280 breast cancer patients treated BCS-RT were included. Nine Clinical Target Volume (CTV) contours were created for each patient: i) CTV_crop (reference), cropped 5 mm from the skin and ii) CTV_skin, uncropped and including the skin, iii) segmenting the 95% isodose (Iso95%) and iv) 3 different auto-contouring atlases generating uncropped and cropped contours (Atlas_skin/Atlas_crop). To illustrate the impact of contour variation on NTCP estimates, we applied two equations predicting fibrosis grade ≥ 2 at 5 years, based on Lyman-Kutcher-Burman (LKB) and Relative Seriality (RS) models, respectively, to each contour. Differences were evaluated using repeated-measures ANOVA. For completeness, the association between observed fibrosis events and NTCP estimates was also evaluated using logistic regression. RESULTS: There were minimal differences between contours when the same contouring approach was followed (cropped and uncropped). CTV_skin and Atlas_skin contours had lower NTCP estimates (-3.92%, IQR 4.00, p < 0.05) compared to CTV_crop. No significant difference was observed for Atlas_crop and Iso95% contours compared to CTV_crop. For the whole cohort, NTCP estimates varied between 5.3% and 49.5% (LKB) or 2.2% and 49.6% (RS) depending on the choice of contours. NTCP estimates for individual patients varied by up to a factor of 4. Estimates from "skin" contours showed higher agreement with observed events. CONCLUSION: Contour variations can lead to significantly different NTCP estimates for breast fibrosis, highlighting the importance of standardising breast contours before developing and/or applying NTCP models.


Assuntos
Neoplasias da Mama , Doença da Mama Fibrocística , Feminino , Humanos , Dosagem Radioterapêutica , Neoplasias da Mama/radioterapia , Neoplasias da Mama/cirurgia , Mama/diagnóstico por imagem , Planejamento da Radioterapia Assistida por Computador , Probabilidade , Fibrose
20.
Clin Epidemiol ; 15: 957-968, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37700930

RESUMO

Objective: To examine the agreement between emergency medical service (EMS) providers, neurology residents and neurology consultants, using the Cincinnati Prehospital Stroke Scale (CPSS) and the Prehospital Acute Stroke Severity Scale (PASS). Methods: Patients with stroke, transient ischemic attack (TIA) and stroke mimic were included upon primary stroke admission or during rehabilitation. Patients were included from June 2018 to September 2019. Video recordings were made of patients being assessed with CPSS and PASS. The recordings were later presented to the healthcare professionals. To determine relative and absolute interrater reliability in terms of inter-rater agreement (IRA), we used generalisability theory. Group-level agreement was determined against a gold standard and presented as an area under the curve (AUC). The gold standard was a consensus agreement between two neurology consultants. Results: A total of 120 patient recordings were assessed by 30 EMS providers, two neurology residents and two neurology consultants. Using the CPSS and the PASS, a total of 1,800 assessments were completed by EMS providers, 240 by neurology residents and 240 by neurology consultants. The overall relative and absolute IRA for all items combined from the CPSS and PASS score was 0.84 (95% CI 0.80; 0.87) and 0.81 (95% CI 0.77; 0.85), respectively. Using the CPSS, the agreement on a group-level resulted in AUCs of 0.83 (95% CI 0.78; 0.88) for the EMS providers and 0.86 (95% CI 0.82; 0.90) for the neurology residents when compared with the gold standard. Using the PASS, the AUC was 0.82 (95% CI 0.77; 0.87) for the EMS providers and 0.88 (95% CI 0.84; 0.93) for the neurology residents. Conclusion: The high relative and absolute inter-rater agreement underpins a high robustness/generalisability of the two scales. A high agreement exists across individual raters and different groups of healthcare professionals supporting widespread applicability of the stroke scales.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...